e,  LOGARITHMS,  AND  THE  NORMAL  DISTRIBUTION 


Dallas  R.  Hodgins 


Medical  Information  Systems  Department 
Naval  Health  Research  Center 
P.  0.  Box  85122 

San  Diego,  California  ^21 38  9174 


Accession  For 

NTIS  GRA&I 
DTIC  TAB 
Unannounced  □ 
Justification— 


By- 


Distribution/ 


'  o  TIC  \  j 

copy  I  ! 

Dlst 

INSPCCTEO J 

1 

M 

Availability  Codes 
jAvai*L  and/or 
Special 


Report  88-9,  supported  by  the  Nava]  Medical  Research  and  Development  Command, 
Department  of  the  Navy,  under  work  unit  MO095. "05-1053.  The  views  expressed 
in  this  article  are  those  of  the  author  and  do  net  [eflert  the  official  policy 
or  position  of  t he  Department  of  the  Navy,  Department  ol  Defense,  nor  the  U.S. 
Government.  Approved  for  public  release,  d  i  s 1 1  i  bn  t  i  on  unlimited. 


The  author  wishes  to  acknowledge  the  able  assistance  of  Kathryn  Medrano  in 
preparing  this  paper. 


Suanary 

Fast,  accurate  numerical  algorithms  for  ex  and  the  logarithms  of  numbers 
are  necessary  to  develop  useful  statistical  models  such  as  a  rational  poly¬ 
nomial  approximation  of  the  normal  probability  density  function  integral. 

Exploiting  the  string  functions  $EXTRACT,  $FIND,  and  $ LENGTH  of  the  MUMPS 
programming  language,  extremely  precise  algorithms  are  presented  for  ex,  the 
natural  and  common  log  of  N,  the  error  function,  and  the  normal  probability 
density  function. 


The  standardi2ed  normal  variable  distribution  routine  presented  is  accur¬ 
ate  to  1.5  parts  in  ten  million  -  affording  the  analyst  comfortable  margins  in 
models  requiring  extensive  numeric  manipulation. 
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e  ,  LOGARITHMS,  AND  THE  NORMAL  DISTRIBUTION 

Dallas  R.  Hodgins 

Introduction 

Since  the  Veteran's  Administration  File  Manager  (FM)  does  not  presently 
have  a  very  wide  selection  of  statistical  tools,  the  Naval  Health  Research 
Center  initiated  a  program  to  enhance  FM's  statistical  capabilities.  To  date, 
routines  for  basic  descriptive  statistics*,  a  random  number  generator,  multi¬ 
ple  regression,  and  the  routines  presented  herein  have  been  completed.  The 
philosophy  has  been  to  develop  accurate,  concise  structures  that  industrial 
hygienists  in  shipyards,  medical  workers,  and  administrators  can  easily  and 
comfortably  use  in  dealing  with  small  samples. 

Much  emphasis  has  been  placed  on  using  FM  telationally .  The  dovetailing 
of  relational  data  structure,  the  algorithms,  and  the  coding  is  a  remarkable 
outcome  of  the  essential  linearity  of  a  relational  logical  view  of  the  data. 
The  routines  presented  in  this  paper  are  algebraic  structures  written  using 
linear  algebra  notation.  The  correlation  between  the  mathematical  models  used 
and  the  computer  models  developed  is  emphasized  -  not  to  pontificate,  but  to 
stress  the  inherent  integrity  of  the  methods  and  to  underscore  the  accuracy 
and  conciseness  attainable  in  routines  written  in  MUMPS.  MUMPS  is  a  flexible 
media  that  not  only  allows  great  latitude  of  °xpression,  but  is  logically 
appealing,  allowing  code  structures  that  are  aesthetically  attractive. 

The  goal  is  to  place  in  the  hands  of  researchers  tire  ability  to  sample 
their  data,  ordet  it,  and  derive  basic  infeteiues  without  the  "blackbox" 
alienation  induced  by  the  magnificent,  but  o'tr  helming,  services  of  some  of 
the  popular  commercial  packages.  FM  and  the  programs  in  this  paper  are  in  the 
public  domain.  The  source  code  is  presented  f<M  all  to  critique  -  which  is 
the  pleasure  and  challenge  of  an  open  forum. 

The  progression  of  this  work  has  been  to  Ic  elop  the  elementary  measures 
of  central  tendency  witlr  an  emphasis  on  indexing  so  that  in  sorting,  tot  exam 
pie,  entities  can  be  compared  across  domains,  nr  -ing  data  bandied  gracefully, 
and  fast  processing  achieved.  The  taming  of  ^  in  this  paper  .  combined  with 
the  existing  programs,  allows  the  researcher  to  deal  with  probability  distii 
butions  germane  to  his  actual  data.  For  example,  tire  mean  and  -  tandard  devia¬ 
tion  of  the  weight  of  the  men  (ot  women)  ol  a  nrvticulat  shipyard  can  now  Ire 
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computed,  and  the  probability  of  the  occurrence  of  an  individual  weight  can  be 
computed  at  once  using  NORMDIST. 


Methods  and  Materials 

The  routines  were  developed  using  Intersys terns  MUMPS  M/VX  3.1  on  a  Digital 
Equipment  Corporation  (DEC)  VAX  11/ /85  machine.  The  programs  meet  or  exceed 
the  accuracy  stated  and  have  been  thoroughly  res  red  against  values  published 
in  the  Handbook  of  Mathematical  Functions  of  National  Bureau  of  Standards 
(Applied  Mathematics  Series  55,  ninth  printing.  November  1970).  In  the  dis¬ 
cussion  of  the  routines,  N  is  any  real  number.  The  discrepancy  between  the 
symbols  used  in  the  discussion  and  the  symbol'1  in  the  routines  stems  from 
using  conventional  mathematical  notation  to  "rite  the  algorithms  ("x"  is 
always  the  working  variable,  etc.)  and  use  of  symbols  in  the  meta  mathematical 
language  that  convey  meaningful  images. 

We  shall  develop  an  algorithm  to  produce  natural  logarithms  (In),  then  a 
program  to  yield  a  number  if  given  its  logarithm,  and  finally  relate  these 
inverse  processes  to  the  normal  probability  density  function.  While  the 
models  are  rigorous  in  a  mathematical  sense,  the  exposition  is  not. 

It  is  of  great  convenience  to  represent  numbers,  N,  by  raising  e  (the 
Naperian  or  natural  logarithm  base:  e  ^  lim(l^l  n)n  as  n  goes  to  infinity)  to 
a  power  b: 


N  =  e 


By  definition,  the  In  of  N  is  b 

In  N  -  b  In  e  =  b 

as  the  In  e  =  1  (if  N  =  e,  we  have  e  =  e^  so  In  e  -  b  In  e,  or  b  -  1;  e  -  p^). 

Given  N,  we  want  an  expression  that  will  produce  b,  the  In  of  N.  We  could 
use  Taylor's  series  expansion  for 

l/(l+x)  =1  -  x  +  x^  -  x^  ...  for  |x|<l 

and  integrate  term  by  term  to  obtain 

ln(l+x)  =  x  -  x^/2  r  x^/3  x^/-*  ...  for  |x|<l 

The  problem  with  this  method  is  its  slow  con’s’ nonce  which  leads  to  the  need 
of  many  terms,  thereby  increasing  the  possibility  of  errot  .  Hastings- 
approximated  this  powei  series  with  the  polynomial 

ln(ltx)  =  a^x  i  a2x  1  a •••  agx  *  E(x)  (1) 


for  0  <  x  <  1  where  |e(x)j  <  3E-R  and 


a1  =  .99999  64239 
a2  =  -.49987  41238 
a3  =  .33179  90258 


a4  = 


.24073  38084 
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.16765  40711 
.09532  93897 
.03608  84937 
.00645  35442 


The  discerning  reader  will  note  that  x  greater  than  1.0  are  not  fit  for 
the  formula.  What  do  we  do  when  we  want  the  In  of  a  number  greater  than  2? 


vt 
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We  must  convert  numbers  greater  than  2  to  the  form  (l+.bbb...)  and  subsequent 
ly  retransform  the  value  obtained  from  equation  (1)  (this  is  the  only  note 
worthy  activity  in  the  routine  LOG). 

Turning  to  LOG  (figure  1),  let  us  step  thiough  the  process  as  we  look  at 
the  arithmetic  for  the  In  ot  N  =  22.345.  At  CHAR' LOG  we  set  Z  =  22.345  the  N 
selected  at  SEL.  The  hub  of  the  game  in  logarithms  is  dealing  with  the 
decimal  position  of  N.  The  MUMPS  SFINP( Z, " . " )  function  returns  0  if  there  is 


LOG  ; NAPERIAN  AND  BRIGGSIAN  LOGARITHMS ,DRH,NHRC, 1/12/88 

; POLYNOMIAL  APPROXIMATION  -  ABSOLUTE  ERROR  LESS  THAN  OR  EQUAL  TO 

3*10E-8) 

S  N(1)=0,N(2)=. 6931471806, N(3)=l. 0986122887, N(4)=l. 3862943611, N(5)= 

1. 6094379124, N(6)=l. 7917594692, N(7)=l. 9459101491, N(8)=2. 0794415417, N(9)= 
2.1972245773 

SEL  S  L=0  R  ! ! , "WHAT  ARGUMENT  ?  (USE  DECIMAL  POINT):  ",X  I  X=""  K  C1,I.,X 

Q 

I  X<0!(X=0)!(X'?.N1".".N)  W  !, "NUMBER  GREATER  THAN  0  NO  COMMAS 
USE  DECIMAL  POINT"  G  SEL 
CHAR  S  Z=X,M=$F(Z, " . " ) 

I  M=0  S  C=$L(Z)  1  F  J=l: 1 :C  S  X=X/10 
I  M>2  S  C=M-3  F  J=1:1:C  S  X  X/10 
I  M=2  S  TM=$L(Z)  1  F  J=1:1:TM  S  X- 10*X 

I  M=2  S  TM2=$L(X),C=-<TH-TM7.rl)  F  J-l : 1 : (TM2- 1 )  S  X=X/10 
S  C1=$E(X, 1) ,X=X/C1- 1 

LnX  S  T(0)=1 ,T(1)=X  F  J=l:l:8  S  T(J)=T(1)*T(J  1) 

S  L= ( . 9999964239*T( 1 ) )- ( . 4998 74 1 238*T( 2 ) )  +  ( . 331 7990258*T( 3 ) ) - 
( .2407338084*T(4))  +  ( . 167654071 1*T( 5) )-( . 095329389 7*T(6) )+( .0360884937*T( 7) )  - 
( .0064535442*T(8) ) 

W  !  , "THE  LOG, BASE  10, OF  ",Z,"  IS  " ,$J(  . 4342.94481 9*(L.N(C1 ) ♦ 

(C*2. 302585093)), 10,8) 

W  ! , "THE  LOG, BASE  e,  OF  ",Z,"  IS  ",$J( (L+N(C1 ) .(C*2 . 302585093) ),  10, 8) 
K  A,C,J,L,M,S,T,TM,TM2,Z  G  SEL 
Q 


Figure  1.  Natural  Logarithm 
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no  decimal  point  in  the  number,  2  if  N  is  of  the  form  .xxx...,  and  an  integer 
equal  to  two  more  than  the  number  of  integers  preceding  the  point  in  all  other 
cases — $F(22.345,".")=4.  Alternate  paths  to  this  information  are  more  diffi¬ 
cult  without  direct  access  to  the  registers,  as  any  Fortran  programmer  will 
testify.  For  instance,  you  can  replace  $F  with 

S  N=22. 345  F  J=l:l  S  C=J,Y=N/10,N=Y  I  Y<1  S  C=C-1  W  !,C  Q 
and  a  similar  line  for  numbers  less  than  1.  Setting  M  =  $F(Z,".")  has  three 
possibilities : 

1)  when  M=0,  there  is  no  decimal  point  in  Z;  therefore  the  characteristic 
of  Z  is  $LENGTH(Z)-1  (logarithms  are  traditionally  reported  as 
.aaa...Ec  (where  E  stands  for  "exponent").  The  part  .aaa...  is  called 
the  mantissa.  Ec  is  10  (which  places  the  decimal)  with  c  being  known 
as  the  "characteristic"); 

2)  if  M>2,  there  is  at  least  one  digit  in  front  of  the  decimal.  We  set 

c=M-3,  as  we  want  Z  in  the  form  a. aaa...  as  the  ln(a.aaa...)  is 

-bbb....  We  subtract  three  in  order  to  leave  a  digit,  avoiding 

counting  the  decimal,  and  to  account  for  the  fact  $F  goes  one  beyond 

£ 

the  decimal.  Knowing  the  characteristic  we  now  divide  Z  by  10 

F  J=1:1:C  S  Z=Z/10 

to  yield  2.2345E1  the  logarithmic  form  we  seek. 

3)  M=2  means  we  have  a  number  like  .aaa...  oi  .  000 .  .  .  aaa .  .  .  which  is 

slightly  more  complicated.  If  we  wanted  ln( . 000123)  we  would  do  the 
following:  S  TM=$L( .000123)-1  which  equals  6;  multiply  .000123  by  10 

six  times  (see  CHAR  +  3)  giving  us  123;  setting  TM2  =  $I,( 1 ? 3 ) =  3  allows  us 
to  calculate  c,  c=-(TM-TM2+l )  or  c  =  -(6-3+l  )  =  -4  giving  us  1.23E-4,  the 
form  we  desire  for  ln(. 000123). 

At  this  point,  we  have  the  code  to  reduce  all  numbers  to  the  form 
a.aaa...Ec,  and  we  know  the  characteristic  c.  Our  example  22.145  has  become 
2.2345E1  with  c  =  l.  What  we  need  for  the  approximation  in  equation  (1)  is 
.aaa...  as  we  are  going  to  compute  ln( 1  * . aaa . . .  ) .  Therefore,  we  set 

C1=$EXTRACT(2 . 2345)  which  gives  us  the  2  in  front  of  the  decimal  and  divide 
2.2345  by  Cl ( =2 )  to  obtain  1.11725  from  which  we  subtract  1  yielding  .11725. 
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Ue  are  now  ready  to  evaluate  ln(l+.  11725)  with  x-.  11  725.  Line  LnX  +  1 
multiplies  the  terms,  sums  them,  and  places  the  result  in  L.  The  last  chore 
is  to  untrack  L=ln(l. 11725)  as  we  are  seeking  ln(22.345).  What  we  did, 
actually,  was  first  divide  22.345  by  10l  (where  c  was  1),  then  we  divided  by 
Cl  (where  Cl  was  2);  so  I.  really  is  the  value  of  ln(22 . 345/Cl*10C)  or  the 
ln(22 . 345/2*10) .  By  generally  accepted  rules 

L  =  Inl22.345/(C1*C) J  =  ln(22.345)  ln(Cl*C) 

=  ln(22.345)  -  [In  Cl+ln  C[ 

=  In  22.345  -  In  Cl-ln  C 

but  C  is  actually  10  ,  so 

L  =  In  22.345  In  Cl  -  C  In  10 
Shifting  the  terms  about,  our  answer  has  the  lotm 
Ln  22.345  =  L  +  In  Cl  *  C  In  10 

-  .11087  t  .693147  e  (1*2.302585) 

=  3.1066 

The  only  item  left  needing  explanation  is  tl  which  takes  on  the  values 
0,1, 2... 9  and  these  Ins  are  in  N(1)...N(9).  The  code  in  LnX+3  reads 

ln  Z=$  J  ( ( L+N  ( Cl )  +  (  C*2 . 30 . . . )),10,8) 

The  Briggsian  or  common  Log  of  N  is  simply  the  constant  .434...  times  the  ln 
X.  Plainly,  the  efficacy  of  the  process  lies  in  the  string  functions  $F,  $L, 
and  $E. 

Computing  the  antilog  e*  is  more  straight  forward.  Given  the  natural 
logarithm  of  N,  what  is  N?  This  time  we  "borrow"  our  polynomial  approximation 

ftom  Messrs.  B.  Carlson  and  M.  Goldstein  ol  thv  Los  A  linos  Sciintific  Labora- 

„  3 

tory  : 


x  .  2 

e  =  1  +  ajX  i  a2x  i 

3 

a3X 

7  , 

. ..  a^x  r  e(x) 

(2) 

where  0<x<ln2,  |e(x)l<2E-10  and 

aj  =  .99999  99995 

a5 

.00830  13598 

a2  =  .49999  99206 

a6 

.00137  98820 

-  -.16666  53019 

n/ 

.00014  13161 

a.  .04165  73475 

4 

Note  that  great  cnie  vas  taken  if'  ei)' 

ui  e  i!" 

■  -r  i  i  on  J  .K  nil  ary 

i  n  t  lie  l  cm  t  i  ne 

EXPX.  The  E(0)  to  E(20)  values  (e"  where  x  ".1.2...  20  respectively)  are  13 
digits  to  ensure  our  2  parts  in  10  billion  ac-uiacy.  TIhuc  arc  two  points  to 
note  about  the  polynomial  approximation:  1)  ate  finding  e  "  so  we  •mis’ 


compute  the  reciprocal  of  equation  2  eventually;  and  2)  the  polynomial  is 
accurate  with  augments  less  than  or  equal  to  In  2  (.6931471). 

The  first  part  of  EXPX  (see  figure  2)  is  self  explanatory.  We  accept  only 
N  less  than  20  in  SEL,  as  larger  N  cause  errors  greater  than  +  2E-10.  Since 
negative  N  are  fair  game,  we  set  switch  SW2=1,  make  N  positive,  and  take  the 
reciprocal  later  for  them. 

At  DECIMAI/EXPX  we  start  to  explore  our  N.  If  N  is  negative  and  M=0  (i.e. 
$F  found  no  decimal  point),  we  simply  print  the  appropriate  reciprocal  of  E(N) 
(i.e.  1/E(N)).  If  M  =  0  and  N  is  positive,  we  again  print  the  answer  E(N) 
immediately.  IF  M>2,  we  set  C  =  M-2  to  capture  the  number  of  digits  before 
the  decimal;  set  CC  =  $E(N,1,C)  to  obtain  the  actual  number  before  the  decimal 
point  (remember  it  can  only  be  one  of  the  set  2.3, 4... 20);  and  set  MM  =  N-CC 
to  obtain  the  decimal  fraction.  If  M=2,  indicating  a  number  less  than  one,  we 
set  MM  =  N,  and  CC  =  0. 


EXPX  ;  e  RAISED  TO  POWER  X  (X<=20) ,DRH,NHRC, 1/12/88 

; POLYNOMIAL  APPROXIMATION  -  ABSOLUTE  ERROR  LESS  THAN  OR  EQUAL  TO 

2*E-10 

S  E(0)=1,E<1)=2. 71828182846, E<2)=7. 3890560989, E(3)=20. 0855369232, 
F.(4)=54. 5981500331, E(5)=148. 413159103, E(6)=403. 428793493, E(7)=1096. 63315843, 

E( 8 )=29au. 95798704, E( 9 ) =8 103. 03392 758, E( 19) =22026  4657948, E(ll)=59874. 1417152, 
E( 12 )=162754. 791429, E( 13) =442413. 392009, E( 14) =1202604. 28416, E( 15) =326901 7 . 3724 
7, E(16)=8886110. 52051, E(17)=24154952. 7536, E( 18 )=65659969. 1373, E(19)=l 78482300. 
963, E( 20) =4851 65195. 41 

SEL  R  X,Z  S  SV=0, SW2=0  R  ! ! , "WHAT  ARGUMENT  (X<=20)  ?  ",X  I  X=""  K 

SW ,  SW2 ,  E  Q 

I  X=1  V  ! ,  "EXP  1  .36787944117"  0  SF.l. 

I  X<0  S  SW2=1 , X=-X 
I  X=0  W  ! , "EXP  0  =  1"  G  SEL 

I  X=1  W  ! , " EXP  1  =  2.71828182846  WHICH  IS  ""e"""  G  SEL 
DECIMAL  S  Z=X,M=$F(Z, " . " ) 

I  (SW2)£>(M=0)  W  ! !  ,"EXP  ",-Z,"  =  " ,$J(  1 /E(Z)  ,  10, 10)  G  SEL 
I  M=0  W  ! , "EXP  ",Z,"  -  " , E ( Z )  G  SEL 
I  M>2  S  C=M-2,CC=$E(Z,1,C),MM=Z  CC 
I  M=2  S  MM=Z , CC=0 
I  MM>. 6931 4718056  S  MM--MM/2,SW  1 
POLY  S  T(0)=1 ,T( 1)=MM  F  J=l:l:7  S  T(J)=T(1 )*T(J  1 ) 

S  EX=l-( . 9999999995*T( 1 ) ) *- ( . 4999999206*T( 2 ) )  ( . 1 66665301 9*T( 3) ) t 
( .0416573475*T(4) )- ( . 008301 3598*T( 5) )  +  ( .00132988Z*T(6) )-  ( . 0001413161*T( 7 ) ) 

I  SW  S  EX=EX*EX 

I  SW2  W  ! ! , "EXP  ",  X,"  =  ",$J(1/((1/EX)*E(CC)),10,10)  G  KL 
W  ! ! , "EXP  ",X,"  =  " ,$J( ( 1 /EX)*E(CC) , 10. 10) 

KL  K  C ,  CC ,  EX ,  J ,  M ,  MM ,  SW ,  SW2  ,  T ,  X ,  Z  C,  SEL 

_ Q_  _ _ _  _  _  _ _ _  _ _ _ 

X 

pigure  2.  e 


Now  we  must  check  to  see  if  the  decimal  fraction  is  less  than  In  2.  If  it 

is  not,  we  divide  by  2  and  set  switch  SW  =  1.  In  POLY  we  perform  exactly  the 

operations  as  in  LOG,  placing  the  evaluation  of  the  polynomial  in  EX.  If  SW  = 

-MM/2  -MM/2  -MM 

1,  we  undo  our  division  by  2  by  setting  EX  =  EX*EX  as  e  *  e  =  e 

If  SW2  =  1,  we  change  the  sign  of  N  back  to  -1  and  print  the  reciprocal  of  the 
reciprocal  of  EX  times  E(CC).  Nonnegative  N  are  printed  as  the  reciprocal  of 
EX  times  E(CC)  as  EX  is  e  x  if  you  recall. 

3.415 

As  an  example,  let  us  find  the  antilog  of  3.415  (i.e.,  evaluate  e 

o  /IS 

which  is  eJ  *  e’  ).  We  would  read  E(3)  and  evaluate  EX  for  .415  and  mul¬ 
tiply  E ( 3 )  times  EX  for  the  answer  20.0855369232  ■  1.5143707  =  30.417. 

The  third  routine,  NORMDIST,  is  the  least  accurate,  but  not  to  worry,  we 
will  replace  it  later.  It  is  the  most  interesting  of  the  three  -  actually 
what  this  is  all  about.  The  standardized  noima!  random  variable  probability 


function  is 


f(2)  = 


2  / 

e  z/  Z 


which  is  arrived  at  by  setting  the  mean  u=0  and  standaid  deviation  a=  i  in  the 
Gaussian  distribution 


f(x)  = 


V  2Tla 


If  we  know  the  mean  u  and  the  standard  deviation  a  of  a  population  wc  can  use 
(3)  by  transforming  out  seer"  x  to  a  standard  i  ..cd  normal  variable  z-(>:  j)/o. 

For  example,  if  we  have  a  population  mean  n=2.5  and  standard  deviation 
a=1.5  and  select  a  subject  whose  taw  score  is  a. 96,  what  is  the  probability 
this  score  is  greater  than  zero  and  less  than  or  equal  to  4.96?  First, 

z  =  (4.96  2.5)/(1.5)  -  1.64. 


The  answer  is 


P(0<Z<1 .64) 


.1.3448, 


^2F  oJ 


the  area  under  the  density  function  from  o  to  1  .  ■  - . 

NORMDIST  simply  produces  the  familiar  tabu:  t:  values  found  at  the  back  of 
every  statistics  text.  Knowing  the  moan  and  '■tandard  deviation  of  any  Gaus 


'>  \>  wj»  ” 
to 
to 


sian  distribution,  one  can  translate  any  measure  to  the  standardized  normal 
variable  by  z  =  (x-u)/o  and  plug  it  into  NORMDIST  to  get: 

i)  P(0<Z<z) 

ii)  P(Z>z) 

iii)  P(Z<z) 

iv)  P( | Z | <z) 

The  values  returned  are  in  error  to  the  extent  of  +5  units  in  the  fourth 
decimal  digit. 

— x2 

The  numerical  analysis  is  indirect.  It  turns  out  the  integral  of  e  dx 
cannot  be  integrated  in  finite  terms.  Ue  could,  as  with  eX,  use  a  Taylor's 

series  expansion  around  points  of  interest  to  ensure  convergence,  but  it  is 

2 

much  easier  to  pluck  Hastings'  brain.  The  error  function 


erf  z  = 


is  close  to  what  we  are  aftei;  namely, 


v2 

>‘x  dx 


'/2TL 


c  2 

j  t~x/2  dt 


Hastings  has  approximated  the  erf  with  a  rational  approximation 


erf  x  =  1  -  1/(1  +  a^x  a  +  a^x^  *  a^x^) *  ♦  e(X) 


X  real  and 


|  e(X)|<  5E-4  whet 


aj  =  .278393 
a3  =  .000972 


a2  .230389 

a.  -  .0/8108 

4 


Ue  must  make  the  transformation 


-!2/2  =  -X2 


let  ting 


dt=  '^2  dx 


I 


\ 


?  Z 

substituting  -x c  for  -  t  an(f  '/2~  dx  for  dt  in  equation  (6)  we  arrive  at 


f(v)  =  -^|=r  J  dx  ) 


which  reduces  to 


1  rv  „  2 
f(v)  =  f  e  ’x  & 

VTt 


The  expression  (8)  differs  from  the  appi ox i ina t i on  (7)  lor  the  erf  (5)  by  a 


factor  of  2 


2  1 
15  tvice 

ti  V  n 


so  we  must  divide  equation  (7)  by  2  for  the  result  in  line  SEL+6  in  NORMDIST : 

S  ANS  =  $J((l-(l/S4))/2,6,4) 

Turning  to  NORMDIST  (see  figure  1),  we  place  in  the  A ( i )  the  coefficients 

of  the  rational  approximation  (7).  Entering  an  M  at  SEE,  we  make  oui  variable 

1/2 

transformation  by  setting  X  -  N/(?  ').  After  computing  the  power  terms  T(J), 

multiplying  the  A  ( i )  *T  ( J )  *  adding  1  and  summing,  the  sum  is  Laised  to  the 
fourth  power.  Following  the  form  of  equation  ( 7 )  we  take  tire  reciprocal  of 
the  sum,  subtract  the  result  from  1  and  di'  ide  by  2  to  reconcile  the 
multiplicative  terms  of  the  integral. 

Discussion 

There  arc  three  pedestrian  matters  to  dispel  of.  First,  the  accuracy  of 
erf  is  not  consistent  with  I.Of,  and  FXPX.  The  re  <d<i  is  uigecl  to  use  dustings' 
rational  approximation 

p  6  16 

erf  x  -  1  1/(1  t  a^x  t  a^x  t  a^x^  ...  a^x  )  <  e(X) 

0  <  x  <  ®  where  | r( X ) !  <  3E  7  and 


ax  =  .07052  30784  a2  =  .04228  20123 

a3  =  .00927  05272  a 4  =  .00015  20143 

a5  =  .00027  65672  a6  =  .00004  30638 

This  expression  was  not  used  originally,  as  there  was  no  reasonable  way  to  get 

the  16th  power  without  risking  grievious  errors  -  motivating  writing  the  LOG 
and  EXPX  routines. 


NORMDIST  ; STANDARDIZED  NORMAL  VARIABLE  DISTRIBUTION, DRH,NHRC, 1/1 2/88 
; ABSOLUTE  ERROR  IN  Z  LESS  THAN  OR  EQUAL  TO  5*10E-4 
S  A(l)=. 278393, A(2)=. 230389, A(3)=. 000972, A(4)=. 078108 
SEL  S  SN=1  R  !!, "ENTER  z  SCORE  :  ",Z  I  Z=""  K.  A,Z,ANS,SN  Q 

I  Z<0  S  Z=-Z, SN=-1 
S  X=Z/1. 4142136, S=0 

S  T(0)=1 ,T(1)=X  F  J=l:l :4  S  T(J)=T(1)*T( J-l ) 

F  J=1 : 1 : 4  S  S=S+(A(J)*T(J)) 

S  S=S+1 ,  S4=-S*S*S*S 

S  R=(l-( 1/S4))/1. 1283792, ANS=$J<R*. 39894 72*1. 4147136, 6, 4) 

ANS  V  !!,"IF  Z  IS  THE  STANDARD  NORMAL  RANDOM  VARIABLE  ",SN*Z,"  THEN 

SN<1  G  NEG 

,"P(0  <  Z  <  ",Z,")=  " , ANS 
,"P(Z  >  ",Z,")=  ",.5-ANS 

,"P(Z  <  ",Z,")=  ", .5+ANS,?33,"(NOTE:  ERROR  IN  4TH  DIGIT  +5)" 
',"P(|Z|  <  ",Z,")=  " , 2*ANS  G  KL 


I 

U 

V 

V 

V 

NEG  V 

V 

V 

V 

KL  K  J,R,S,S4,T,X,Z  G  SEL 

Q 


’P(Z  <  ",-Z,")=  ",.5-ANS 
,"P(Z  >  ",-Z,")=  ",.5+ANS 
,"P(",-Z,"  <  Z  <  0)=  " , ANS 

’P(|Z|  >  ",-Z,")=  " ,2*ANS, ?33, "(NOTE:  ERROR  IN  4TH  DIGIT  +5)* 


Figure  3.  The  Standardized  Normal  Variable  Distribution 


Enter  the  new  A(i),  change  SEL+3  to  J=1:1:G.  add  1  to  S,  and  take  the 
ln(S+l ) : 

N=(S^1)16 
In  N  =  161n(S+l ) 

(You  will  have  to  change  LOG  and  EXPX  to  opet,’hc  as  subroutines:  i.e.  instead 
of  W  !,  you  must  set  "ANS1  =  ”  to  the  value  computed  by  Lot;,  etc.) 

S  -  (S+l)  D'LOG  S  S3- ANS 1*1 6 

S  X=S3  D" EXPX  S  S4r  ANS7 

S3 

N  is  ANS?  from  EXPX  which  computed  e  '  for  von-  The  rest  of  NORMDIST  remains 
unchanged  and  you  have  now  achieved  3  parts  in  ten  million  accuracy:  your 


answer  is  in  error  +  3  units  in  the  seventh  decimal  place. 

2 

If  the  erf  is  of  no  interest,  Hastings'  rational  approximation 
P(x)  =  1  -  1/2(1  +  Fxx  +  F2x2  +  F3x3  ...  F6x6)  16  +  e(x) 
where  |e(x)|  <  1.5x10  2  and 


Fx  =  .04986  73470 
F2  =  .02114  10061 
F„  =  .00327  76263 


F.  =  .00003  80036 
4 

F5  =  .00004  88906 
F6  =  .00000  53830 


will  produce 


h  s 

l  It 


-t2/2  a- 
e  dt 


to  an  accuracy  of  one  and  one  half  parts  in  ten  million.  The  code  for  (9)  is 
displayed  in  the  routine  NORM  (see  figure  4).  Note  the  line  SEL  +  6  which 
assumes  our  LOG  and  EXPX  routines  have  been  concerted  to  subroutines  LGN  and 
EXX  respectively.  Following  the  style  of  FM  we  enter  subroutines  with  x  and 
come  out  with  y. 

Second,  the  use  of  the  error  function  to  evaluate  the  standardized  normal 
variable  function  does  not  preclude  us  fiom  tiding  it,  also,  as  part  of  our 
armament.  The  erf (ha)  is  the  probability  that  the  error  of  a  single  measure¬ 
ment  lies  between  +  a,  where  h  is  the  precision  index.  Those  using  a  digita. 
approach  to  neural  nets  will  appreciate  the  utility  of  both  the  error  function 
and  the  normal  probability  density  function  in  adjusting  weights  in  logical 
threshold  units. 

Third,  standardizing  a  i andom  variable  is.  in  itself,  a  powerful  ploy. 
Setting  z=(x-u)/ct  conveys  a  wealth  of  information  about  the  location  and 
status  of  a  score  in  the  distribution,  if  one  keeps  in  mind  that  the  distri¬ 
bution  mean  is  zero  and  the  standard  deviation  is  one.  Negative  z  are  less 
than  the  population  mean  -  positive  z  greatei.  Given  a  z  ol  1.64  you  know 
instantly  this  is  a  score  almost  two  standard  deviations  above  the  mean  of  the 
population  -  a  reasonably  rare  event.  Rixty-eigh '  percent  of  the  d is t i i bu t i on 
lies  between  +1  standard  deviations.  Ninety  fi  s.  percent  between  *3  standard 
deviations.  Knowing  these  relationships  and  u  ing  NORMDIST  can  rationalize  a 
myriad  of  guessing  situations. 

Great  care  has  been  taken  to  make  thest  routines,  and  those  previously 
developed,  as  accuiate  as  practically  possible.  The  concept  of  algorithms  for 
small  samples  is  not  whimsical.  ff  one  has  a  mall  sample  with  missing  data 


ErCi 


and  assumes  an  N  which  counts  missing  data  as  present,  the  mean  will  consis¬ 
tently  be  underestimated,  a  serious  matter  in  the  health  care  business. 


NORM  ; STANDARDIZED  NORMAL  VARIABLE  DISTRIBUTION, DRH.NHRC, 1/12/88 
; ABSOLUTE  ERROR  IN  Z  LESS  THAN  OR  EQUAL  TO  1.5*10E-7 
S  F( 1 )=. 049867347, F( 2 )=. 021141006 1,P( 3)-. 0032776263, F( 4 >=.0000380036, 
F ( 5 )  = . 0000488 906 , F ( 6 ) = . 000005383 


S  SN=1  R  !!, "ENTER  z  SCORE  :  ",Z  I  Z=""  K  A,Z,PZ,SN  Q 
I  Z>6  V  ! , "OUT  OF  RANGE-  SELECT  A  ""z""  LESS  THAN  OR  EQUAL  TO  6  "  G 


I  Z<0  S  Z=-Z, SN=-1 
S  (X,ZZ)=Z,S=0 

S  T(0)=1 ,T(1)=X  F  J=1 : 1 : 6  S  T(J)=T(1)*T(J-1) 

F  J=1 : 1 : 6  S  S=S+(F( J)*T( J) ) 

S  S=S+1,X=S  D  'LGN  S  S3=16*Y,X=S3  D  ‘EXX  S  S4  =  Y 
S  R=( 1-(1/(2*S4 ) ) ) , PZ=$J(R, 9 , 7 ) 

V  ! ! , "IF  Z  IS  THE  STANDARDIZED  NORMAL  RANDOM  VARIABLE  AND  z  = 
" , SN*ZZ, "  THEN 

I  SN<1  G  NEG 


V  ! , "P(0  <  Z  <  ",ZZ,")=  " , PZ- . 5 

V  ! , "P(Z  >  ",ZZ,")=  ",1-PZ 

V  ! , "P(Z  <  ",ZZ,")=  ",PZ 

V  ! , "P( | Z |  <  ",ZZ,")=  " , 2*(PZ-.5)  G  XL 

V  ! , "P(Z  <  ",-ZZ,")=  ",1-PZ 

V  ! , "P(Z  >  ",-ZZ,")=  ",PZ 

V  ! ,"P(",-ZZ,"  <  Z  <  0)=  " , PZ- . 5 

V  !,"P(|Z|  >  ",-ZZ,")=  " ,2*(PZ-.5) 

K  J,R,S,S4,T,X,Z  G  SEL 

Q 


Figure  4.  Precision  Standardized  Normal  Variable  Distribution 


Large  samples  take  care  of  themselves.  The  lav  of  large  numbers,  oi  the 
central  limit  theorem,  grant  reprieve  to  shoddy  data  analysis  practices.  In 
the  FM  descriptive  statistics  programs,  ea<  h  field  is  restricted  to  numeric 
values  in  a  definite  range  for  that  domain,  ensuring  data  attribute  integrity 
The  existence  or  nonexistence  of  an  entity  in  that  field  is  ascertained  and 
only  then  is  N  augmented  or  decreased.  These  are  minimal  mechanical  safe 
guards . 

When  looking  at  the  normal  distribution  function 


f(x.  p,C  2  )  = 


/  2/  ? 
e  ‘  (x~P-)/2  o  2 


it  becomes  apparent  that  one  must  minimize  the  error  in  (x  ji)‘  in  ordei  to 
minimize  the  error  in  the  integration  process.  This  is  why  re  needed  to 


I 


Ply 

till 


>>>>>>■-* 


y. 

■-cvcS 


JTy’W*  yvi-if  lTVT  -r  l-v-.T,;,.-.  .v.T 


develop  LOG  and  EXPX  to  write  a  reasonable  standardized  normal  distribution 
function.  Multiplying  a  slight  error  in  X  sixteen  times  is  untenable  (for 
example  .99  to  the  sixteenth  power  =  .85).  On  the  other  hand,  underestimating 
u  by  including  missing  data  in  the  count  is  equally  serious  and  very  mislead¬ 
ing  in  small  samples. 

The  DEC  machines  have  18  decimal  digit  accuracy  (they  actually  carry  19 

digits)  with  64  bit  precision.  This  is  automatic  double  precision  arithmetic. 

A  VAX  750  running  under  the  UNIX  operating  system  scored  the  lowest  error 

rating  in  a  PC  Tech  Journal  accuracy  benchmarking  test  based  on  stringent 
4 

numerical  criteria 

With  accurate  algorithms  and  accurate  machines,  all  that  remains  is  an 
accurate  compiler.  The  MUMPS  community  must  pay  attention  to  the  IEEE  p. 
754/854  standards  for  numerical  computation  which  define  procedures  for 
dealing  with  a  discontinuous  number  space.  If  MUMPS  is  to  gain  the  preemi¬ 
nence  it  deserves,  it  must  handle  numbers  with  precision  and  efficiency. 
Some  companies  utilizing  co-processors  are  certainly  headed  in  the  right 
direction,  producing  native  mode  machine  code  and  using  runtime  systems  that 
handle  indirection  and  the  Xecute  command"1. 

The  one  fly  in  the  ointment  of  the  MUMPS  language  itself  is  the  order  of 
arithmetic  operation  in  an  expression.  Countless  hours  have  been  spent  disco¬ 
vering  MUMPS  has  left  to  right  arithmetical  precedence!  Otherwise,  after  many 
years  of  massaging  numbers,  it  can  be  truthfully  reported  that  doing  numerical 
analysis  with  MUMPS  is  a  pleasure.  I/O  is  the  easiest  of  any  language  used. 
The  string  manipulators  are  without  parallel  in  examining  numbers.  The  fact 
that  one  can  simulate  the  normal  probability  density  integral  in  four  or  five 
lines  of  code  speaks  for  itself. 
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